De novo transcriptome assembly and annotation of the third stage larvae of the zoonotic parasite Anisakis pegreffii

Objectives Anisakis pegreffii is a zoonotic parasite requiring marine organisms to complete its life-history. Human infection (anisakiasis) occurs when the third stage larvae (L3) are accidentally ingested with raw or undercooked infected fish or squids. A new de novo transcriptome of A. pegreffii was here generated aiming to provide a robust bulk of data to be used for a comprehensive "ready-to-use" resource for detecting functional studies on genes and gene products of A. pegreffii involved in the molecular mechanisms of parasite-host interaction. Data description A RNA-seq library of A. pegreffii L3 was here newly generated by using Illumina TruSeq platform. It was combined with other five RNA-seq datasets previously gathered from L3 of the same species stored in SRA of NCBI. The final dataset was analyzed by launching three assembler programs and two validation tools. The use of a robust pipeline produced a high-confidence protein-coding transcriptome of A. pegreffii. These data represent a more robust and complete transcriptome of this species with respect to the actually existing resources. This is of importance for understanding the involved adaptive and immunomodulatory genes implicated in the “cross talk” between the parasite and its hosts, including the accidental one (humans).

between 30°S and 60°S. In humans, the accidental ingestion of third-stage larvae (L3) through the consumption of infected raw, undercooked, or improperly processed fish, causes a zoonosis, known as anisakiasis. Among the currently recognized nine biological species of the genus, so far only A. pegreffii and A. simplex (s.s.) cause anisakiasis [1,3,4].
The investigation of genes and proteins of A. pegreffii is crucial for understanding the parasite biological functions and its adaptation to abiotic and biotic conditions. It also represents a fundamental aspect to add knowledge about the molecular mechanisms involved in the evolutionary host-parasite interaction. Additionally, the molecules involved in the interaction between A. pegreffii and humans have not yet been elucidated. Finally, the absence of a suitable reference genome of this parasite species could make it difficult to achieve those goals. Although several RNA-seq analyses of L3 A. pegreffii at different experimental conditions and from different larvae tissues were carried out [5][6][7][8][9], a complete "ready to use" transcriptome is missing.
Objective of this research was to provide a robust highconfidence protein-coding transcriptome of the L3 stage of A. pegreffii acquired from the assembly of data newly generated in the present study with those previously stored. The findings were to provide a more accurate de novo reference transcriptome of A. pegreffii that will allow to shed light on genes implicated in the "cross-talk" between the parasite and its natural and accidental hosts.

Data description
The input dataset for de novo assembly of A. pegreffii L3 was composed by six RNA-seq datasets ( Table 1, Data file 1, 2): one obtained in the present study (PRJNA752284) ( Table 1, Data file 2) and five retrieved from SRA of NCBI (PRJNA589243, PRJNA602791, PRJNA374530, PRJNA316941, PRJNA312925). In order to obtain the RNAseq dataset in this study, A. pegreffii L3, collected from the viscera of fish from the Mediterranean Sea, were maintained in vitro culture for 24 h. RNA and DNA were extracted from nine L3 using TRIzol reagent, as previously described [10,11]. The extracted RNA from each three L3 was pooled, and the quantity check was performed by using Agilent 2100 Bioanalyzer. The cDNA library was prepared using the TruSeq Stranded mRNA kit (Illumina). Ligated products of 200 bp were excised from agarose gels and PCR amplified. Products were single end sequenced on an Illumina TruSeq platform. Genetic/molecular identification of L3 A. pegreffii was performed by sequences analysis of mitochondrial (mtDNA cox2), and nuclear (EF1 α − 1 nDNA, nas 10 nDNA) gene loci, as previously described [12].
Bioinformatic analysis was performed using a High-Performance-Computing platform [13]. For each bioproject, the quality control of reads was performed running FastQC v.0.11.2, before and after trimming step (Trimmomatic v.0.39 [14]). The quality assessment metrics for all trimmed data were aggregated with MultiQC v.1.9 [15]. Data file 3 (Table 1) shows both the mean read counts per quality scores and the mean quality scores in each base position higher than 35, for all the samples in the six analyzed bioprojects. A total of 393,512,048 cleaned reads (97% of whole raw reads) were obtained after the removal of the low-quality reads.
In order to construct a robust de novo transcriptome, three assembly tools with a multi-kmer approach were adopted: Trinity v.2.11.0 [16]

Limitations
The A. pegreffii transcriptome here obtained was assembled with those RNA-seq data sets from the third larval stage of the parasite species. The single transcriptome available from the fourth stage larva of A. pegreffii [8] was not included in this analysis because the main aim of this analysis was to provide a robust and "ready to use'' transcriptome of the infective stage (third larval stage) of the parasite also provoking the zoonotic disease (anisakiasis) to humans.